Up to this point we have been doing mostly image processing: \(F:I(x,y)\rightarrow I'(x,y)\).
That’s not why we’re here, we’re here to talk about computer vision! We want to put images in and get ‘cool stuff’ out.
Useful stuff we can find with computer vision:
Today we’ll focus on finding parametric models.
A parametric model can represent a class of instances where each is defined by a value of the parameters; examples of parametric models include lines, circles, or parameterized templates.
When trying to fit a parametric model / find a model in your image, keep in mind:
This is a simple example of line fitting.
It can be difficult going from edges to lines; how many lines do you see in the image?
It’s easy for a human to interpret the image in the lecture as having two, three, or even four lines.
Allow the data to speak for itself. Voting is a general technique where we let the features (little edge points) vote for all models that are compatible with it. We are going to:
Why does voting work? In every election there are voters who are unhappy with the main candidates who write in silly names but they would normally be distributed relatively evenly across the noise. This assumes that good candidates (i.e. nice, clean, edges) will receive a majority of votes. (Note that the 2016 American elections do not fit these assumptions.)
Today’s example is just fitting lines, but some questions need to be answered:
A Hough transform is a voting technique that can answer all of the questions. Each edge point votes for compatible lines and the algorithm looks for lines that get a lot of votes. Keeping track of which points vote for which lines also allows assigning points to individual lines!
The key to the Hough transform is Hough space. Given an image space of x and y with an equation of a line \(y=m_0x+b_0\) we can parameterize it in Hough space. A line in the image space corresponds to a point in the Hough space. A point \((x_0,y_0)\) in the image space will always satisfy the equation \(y_0=m_0x+b_0\). With some simple algebraic rearragement we can solve for \(b=-x_0m+y_0\) and recast the equation in terms of Hough space. A point in image space is a line in Hough space!
What if we have a second point \((x_1,y_1)\)? What in Hough space is consistent with both sets of points? The intersection in Hough space where two lines defined by image space points \((x_0,y_0)\) and \((x_1,y_1)\) meet is an edge! This is how we find lines from points.
We create a grid discretized by m and b made up of a set of bins which tallies ‘votes’ in all the bins it represents. At the end whichever bin has the most votes is the line!
Before this is implemented in real code we have to reconsider our representation of real lines. A vertical line is very painful to represent because of infinite slope. A more robust representation of line would be a polar representation!
The line in polar representation is parameterized by two parameters: \(d\) (the perpendicular distance from line to origin), and \(\theta\) (the angle which the normal line to the original line makes with the x-axis.)
Vector calculus / trigonometry helps yield a formula for \(d=x \cos(\theta) + y \sin(\theta)\).
A point in image space is now a sinusoidal segement in Hough space because of the way we’ve parameterized it. There’s a redundancy / ambiguity in that \(d>0\) but if d could be either positive or negative then \(\theta\) would only need to range from 0 to \(\pi\). If \(d>0\) then it must range from 0 to \(2 \pi\). If d must lie inside your image then it’s even further restricted.
Using the polar parameterization and a Hough Accumulator array where the bins represent different values of d and \(\theta\) the inputs are:
If it goes from 0 to \(\pi\) and you have a bin for every degree that’s 180 bins.
The basic Hough algorithm is:
For each edge point in \(E(x,y)\):
For angles \(\theta\) from 0 to 180 (or \(2 \pi\) in some quantizations.)
The detected line in the image is given by \(d=x \cos(\theta) + y \sin(\theta)\).
How well does the algorithm work? What is the space complexity? How much memory is required to store it?
You need \(k^n\) bins (n dimensions with k bins each) to discretize the space. Adding parameters can be very expensive when each parameter increases the required space exponentially!
What about time complexity in terms of voting? This is linearly proportional to the number of edge points. It’s constant in terms of edge points detected.
In the middle is a representation of a cartoon example. On the left there are a bunch of points that are arranged in a line. On the right are votes represented by ray traces of sinusoidal segments in Hough space. The point in Hough space where the lines intersect represents \(d= \vec X \cos \vec \theta + \vec Y \sin \vec \theta\) which is the set of all points in image space that lie on that edge!
What would a square equate to in Hough space? There would be four lines, represented by four peaks in Hough space!
(The peak in the image above contains all the points in the ideal edge)
What does the bright spot in the Hough array below represent? It corresponds to the longest edge on the largest block in the image.
This is a very simple demo of how to create a Hough transform using Python and OpenCV.
import cv2
import math
import numpy as np
from cs6576helpers import imshow
# A recognizable image with some clearly (and some not so clearly) defined edges.
img = cv2.imread("images/08_12_01.PNG")
# Show off the image
imshow(img, title="Blade Runner")
# Convert the image to gray scale and show what movies looked like in the Hollywood Golden era of before most of you were born.
img2 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Squint your eyes and lose focus juuuust a little bit. (Helps with the noise)
img2 = cv2.GaussianBlur(img2,(11,11),3)
# Show off the new blurry image
imshow(img2, title="Squinty Harrison")
# Do androids dream of electric sheep? Let's use a Canny edge detector to find out!
uncanny_harrison = cv2.Canny(img2, 60, 110)
imshow(uncanny_harrison,title="Electric Sheep")
# If we ran a Hough transform algorithm on it, what do we see?
# Inputs are rho, theta, threshold, and a couple of others.
# rho (double) is the 'Distance resolution of the accumulator in pixels' or a parameter of the discretization. We've been referring to this as 'd'
# theta (double) is the 'Angle resolution of the accumulator in radians' or a parameter of the discretization. We've been referring to this as $\theta$ *but in degrees*!
# threshold (int) is the 'Accumulator threshold parameter. Only those lines are returned that get enough votes (> threshold)'.
# What are good values of rho and theta? Without knowing, why don't we try a grid to see what some values might be.
Rho = [0.2,1.]
Theta = [1,2,4]
for r in Rho:
for t in Theta:
dst_image = np.copy(img)
# The threshold in this picture is set quite high to discard noise.
threshold = 80
hough_harrison = cv2.HoughLines(uncanny_harrison,rho=r,theta=t*np.pi/180,threshold=threshold)
if hough_harrison is not None:
for line in hough_harrison:
rho = line[0][0]
theta = line[0][1]
a = math.cos(theta)
b = math.sin(theta)
x0 = a * rho
y0 = b * rho
pt1 = (int(x0 + 1000*(-b)), int(y0 + 1000*(a)))
pt2 = (int(x0 - 1000*(-b)), int(y0 - 1000*(a)))
cv2.line(dst_image, pt1, pt2, (0,0,255), 1, cv2.LINE_AA)
imshow(dst_image,title="Rho = {}, Theta = {}*pi/180".format(r,t))
The workflow for this algorithm is:
How can we get better results? Examining the edge pixels we can remove noise by
Here is an example of a Hough transform on a real image to show where it is strong and weak. Running it on an image of a football field shows the peaks that are found and a lot of peaks are pretty close together, indicating that they have the same angle and same location. Overlaying them on the football field shows the good and bad news.
It found a lot of lines! It missed a lot lines. The longest line segment it found (Hough transform finds infinite lines and line segments are found by other operations) was not long!
What effect does noise have? It obviously does effect the Hough votes by lowering the precision of the peaks. Changing bin sizes and neighborhood sizes and peak criteria or even smoothing the Hough space image can find peaks from a noisy image.
What happens with a LOT of noise? You’re going to find edges that don’t exist! It’s helpful if you know which lines exist but if you don’t know (or you’re creating an automated algorithm) then that’s more problematic.
One of the largest extensions to the Hough algorithm is using the gradient.
Using the gradient helps downselect potential voters which minimizes noise and greatly reduces the time complexity of the voting process.
Another extension is weighting votes. A strong edge might get more votes!
Yet another would be changing the sampling (bin size) of the Hough parameters to change the resolution. You could do this heirarchically moving from a rough to fine discretization.
The same procedure can be used with circles, squares, or any other shape defined by a template by changing the parameterization.
One of the main lessons from this course: Nothing works like it’s supposed to.